Abstract
Introduction: Phylogenetic inference provides insight into the evolutionary mechanisms that shape hematopoietic progenitor populations over an individual lifetime. By performing whole-genome sequencing (WGS) on isogenic colonies derived from single cells, it is possible to identify somatic mutations and use them to reconstruct phylogenetic trees. Individual somatic mutations can then be stratified to individual branches on the tree, empowering rigorous inferences of evolutionary parameters such as mutation age and fitness effects.
Unfortunately, contemporary methods for localizing somatic mutations on the phylogeny are limited by the infinite sites model of evolution, which for simplicity assumes that each mutation occurred only once, precluding the possibility of complex events (e.g., recurrent mutation). While most blood somatic mutations conform to this assumption, a recent survey of 39 hematopoietic phylogenies (N = 9,658 total progenitors) showed that exceptions grow as functions of donor age, number of clones, and environmental exposures such as chemotherapy (Chapman et al. 2025, Nature), thus motivating methodological development. Here, we adapt a Bayesian method called stochastic character mapping (SCM) to relax the assumption of infinite sites and reveal complex histories of hematopoiesis.
Methods: SCM is a generalizable Bayesian framework for inferring the number, timing, and placement of character state transitions with respect to a phylogeny. Here, we adapted SCM to infer the origin(s) of somatic mutations given a phylogenetic tree of hematopoietic progenitors. SCM uses simulation to approximate the posterior distribution of evolutionary histories for each genetic variant given, a) the observed genotype states and confidence metrics, b) a phylogeny with branch lengths measured in molecular time, and c) a genotype substitution model. Using the posterior distribution of histories for each variant, SCM yields a posterior probability for every genotype state at every node on the phylogenetic tree, enabling probabilistic assignment of mutation events. Importantly, SCM does not assume the infinite sites model of evolution, permitting recovery of complex evolutionary histories. As a demonstration of our method, we applied SCM to colony-based WGS data from hematopoietic progenitors collected from 3 donors (N = 149 total progenitors; DeBoy et al. 2023, NEJM).
Results: In addition to 66,780 somatic single nucleotide polymorphisms (SNPs) exhibiting simple histories of a single mutation events, SCM recovered 426 somatic SNPs with strong evidence (posterior probability ≥ 0.95) of complex evolutionary histories and rejected an additional 377 SNPs initially called as somatic but that are better explained by unsampled germline heterozygosity. Among SNPs with complex evolutionary histories, 51.4% (n = 219/426) were C>T transitions – a common mutation signature associated with aging. Strikingly, 40.6% (n = 173/426) of complex histories were present in >2 contemporary samples, implicating local mutation rate heterogeneity or an association with clonal expansion. Using GENCODE (version 34), we also found that over half (58.2%; n = 248/426) of all somatic SNPs with complex histories could be associated with a gene. We identified at least one somatic SNP from each donor that possessed both a complex evolutionary history and fell within a gene with known relevance to CHIP and/or hematologic malignancy (CUX1 c.e3-8755G>A, JAK2 p.V617F, LRP1B c.e67-3397A>G, and ETNK1 p.N155S). This observation suggests that some previous reports of CHIP-driver-independent oligoclonality may be explained by overconservative models of molecular evolution.
Conclusions: SCM offers unprecedented resolution into the complex histories of somatic evolution (e.g., recurrent mutation) in the hematopoietic progenitor population over an individual lifetime. Through reanalysis of clone-based WGS data, we recovered mutations with complex histories in phylogenies from each of three individuals, including variants within genes of known relevance to myeloproliferative disease. Forthcoming applications of our method to an expansive cohort of donors will provide valuable insight into the evolutionary/genetic mechanisms that shape hematopoietic lineages with age, disease, and therapeutic strategies. We anticipate that the relevance of our approach will grow over the coming decades as the scale and resolution of single-cell WGS methods continue to improve.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal